Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 231
Filtrar
1.
NPJ Digit Med ; 7(1): 117, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714751

RESUMO

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

2.
JAMA Netw Open ; 7(3): e242684, 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38517441

RESUMO

Importance: Surgery with complete tumor resection remains the main treatment option for patients with breast cancer. Yet, current technologies are limited in providing accurate assessment of breast tissue in vivo, warranting development of new technologies for surgical guidance. Objective: To evaluate the performance of the MasSpec Pen for accurate intraoperative assessment of breast tissues and surgical margins based on metabolic and lipid information. Design, Setting, and Participants: In this diagnostic study conducted between February 23, 2017, and August 19, 2021, the mass spectrometry-based device was used to analyze healthy breast and invasive ductal carcinoma (IDC) banked tissue samples from adult patients undergoing breast surgery for ductal carcinomas or nonmalignant conditions. Fresh-frozen tissue samples and touch imprints were analyzed in a laboratory. Intraoperative in vivo and ex vivo breast tissue analyses were performed by surgical staff in operating rooms (ORs) within 2 different hospitals at the Texas Medical Center. Molecular data were used to build statistical classifiers. Main Outcomes and Measures: Prediction results of tissue analyses from classification models were compared with gross assessment, frozen section analysis, and/or final postoperative pathology to assess accuracy. Results: All data acquired from the 143 banked tissue samples, including 79 healthy breast and 64 IDC tissues, were included in the statistical analysis. Data presented rich molecular profiles of healthy and IDC banked tissue samples, with significant changes in relative abundances observed for several metabolic species. Statistical classifiers yielded accuracies of 95.6%, 95.5%, and 90.6% for training, validation, and independent test sets, respectively. A total of 25 participants enrolled in the clinical, intraoperative study; all were female, and the median age was 58 years (IQR, 44-66 years). Intraoperative testing of the technology was successfully performed by surgical staff during 25 breast operations. Of 273 intraoperative analyses performed during 25 surgical cases, 147 analyses from 22 cases were subjected to statistical classification. Testing of the classifiers on 147 intraoperative mass spectra yielded 95.9% agreement with postoperative pathology results. Conclusions and Relevance: The findings of this diagnostic study suggest that the mass spectrometry-based system could be clinically valuable to surgeons and patients by enabling fast molecular-based intraoperative assessment of in vivo and ex vivo breast tissue samples and surgical margins.


Assuntos
Neoplasias da Mama , Adulto , Feminino , Humanos , Pessoa de Meia-Idade , Masculino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/cirurgia , Neoplasias da Mama/patologia , Margens de Excisão , Mama/cirurgia , Mama/patologia , Mastectomia , Espectrometria de Massas
3.
Blood Adv ; 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38498731

RESUMO

Chimeric antigen receptor (CAR) T cells directed against CD19 (CAR19) are a revolutionary treatment for B-cell lymphomas. CAR19 cell expansion is necessary for CAR19 function but is also associated with toxicity. To define the impact of CAR19 expansion on patient outcomes, we prospectively followed a cohort of 236 patients treated with CAR19 (brexucabtagene autoleucel or axicabtagene ciloleucel) for mantle cell (MCL), follicular (FL), and large B-cell lymphoma (LBCL) over the course of five years and obtained CAR19 expansion data using peripheral blood immunophenotyping for 188 of these patients. CAR19 expansion was higher in patients with MCL compared to other lymphoma histologic subtypes. Notably, patients with MCL had increased toxicity and required four-fold higher cumulative steroid doses than patients with LBCL. CAR19 expansion was associated with the development of cytokine release syndrome (CRS), immune effector cell associated neurotoxicity syndrome (ICANS), and the requirement for granulocyte colony stimulating factor (GCSF) after day 14 post-infusion. Younger patients and those with elevated lactate dehydrogenase (LDH) had significantly higher CAR19 expansion. In general, no association between CAR19 expansion and LBCL treatment response was observed. However, when controlling for tumor burden, we found that lower CAR19 expansion in conjunction with low LDH was associated with improved outcomes in LBCL. In sum, this study finds CAR19 expansion principally associates with CAR-related toxicity. Additionally, CAR19 expansion as measured by peripheral blood immunophenotyping may be dispensable to favorable outcomes in LBCL.

4.
bioRxiv ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-35547855

RESUMO

Clinical diagnosis typically incorporates physical examination, patient history, and various laboratory tests and imaging studies, but makes limited use of the human system's own record of antigen exposures encoded by receptors on B cells and T cells. We analyzed immune receptor datasets from 593 individuals to develop MAchine Learning for Immunological Diagnosis (Mal-ID) , an interpretive framework to screen for multiple illnesses simultaneously or precisely test for one condition. This approach detects specific infections, autoimmune disorders, vaccine responses, and disease severity differences. Human-interpretable features of the model recapitulate known immune responses to SARS-CoV-2, Influenza, and HIV, highlight antigen-specific receptors, and reveal distinct characteristics of Systemic Lupus Erythematosus and Type-1 Diabetes autoreactivity. This analysis framework has broad potential for scientific and clinical interpretation of human immune responses.

5.
Stat Med ; 43(5): 855-868, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38111969

RESUMO

The main objective of most clinical trials is to estimate the effect of some treatment compared to a control condition. We define the signal-to-noise ratio (SNR) as the ratio of the true treatment effect to the SE of its estimate. In a previous publication in this journal, we estimated the distribution of the SNR among the clinical trials in the Cochrane Database of Systematic Reviews (CDSR). We found that the SNR is often low, which implies that the power against the true effect is also low in many trials. Here we use the fact that the CDSR is a collection of meta-analyses to quantitatively assess the consequences. Among trials that have reached statistical significance we find considerable overoptimism of the usual unbiased estimator and under-coverage of the associated confidence interval. Previously, we have proposed a novel shrinkage estimator to address this "winner's curse." We compare the performance of our shrinkage estimator to the usual unbiased estimator in terms of the root mean squared error, the coverage and the bias of the magnitude. We find superior performance of the shrinkage estimator both conditionally and unconditionally on statistical significance.


Assuntos
Ensaios Clínicos como Assunto , Humanos , Viés , Revisões Sistemáticas como Assunto , Metanálise como Assunto
6.
Res Sq ; 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-38045288

RESUMO

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

7.
Sci Rep ; 13(1): 16196, 2023 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-37758827

RESUMO

The COVID-19 pandemic has taken a devastating toll around the world. Since January 2020, the World Health Organization estimates 14.9 million excess deaths have occurred globally. Despite this grim number quantifying the deadly impact, the underlying factors contributing to COVID-19 deaths at the population level remain unclear. Prior studies indicate that demographic factors like proportion of population older than 65 and population health explain the cross-country difference in COVID-19 deaths. However, there has not been a comprehensive analysis including variables describing government policies and COVID-19 vaccination rate. Furthermore, prior studies focus on COVID-19 death rather than excess death to assess the impact of the pandemic. Through a robust statistical modeling framework, we analyze 80 countries and show that actionable public health efforts beyond just the factors intrinsic to each country are important for explaining the cross-country heterogeneity in excess death.


Assuntos
COVID-19 , Pandemias , Humanos , COVID-19/epidemiologia , Vacinas contra COVID-19 , Saúde Pública , Governo
8.
Stat Med ; 42(25): 4532-4541, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37580906

RESUMO

Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.


Assuntos
Projetos de Pesquisa , Humanos , Modelos de Riscos Proporcionais , Intervalos de Confiança
9.
Res Sq ; 2023 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-37398389

RESUMO

Microglia are implicated in aging, neurodegeneration, and Alzheimer's disease (AD). Traditional, low-plex, imaging methods fall short of capturing in situ cellular states and interactions in the human brain. We utilized Multiplexed Ion Beam Imaging (MIBI) and data-driven analysis to spatially map proteomic cellular states and niches in healthy human brain, identifying a spectrum of microglial profiles, called the microglial state continuum (MSC). The MSC ranged from senescent-like to active proteomic states that were skewed across large brain regions and compartmentalized locally according to their immediate microenvironment. While more active microglial states were proximal to amyloid plaques, globally, microglia significantly shifted towards a, presumably, dysfunctional low MSC in the AD hippocampus, as confirmed in an independent cohort (n=26). This provides an in situ single cell framework for mapping human microglial states along a continuous, shifting existence that is differentially enriched between healthy brain regions and disease, reinforcing differential microglial functions overall.

10.
JCO Precis Oncol ; 7: e2200668, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37285559

RESUMO

PURPOSE: Accurately distinguishing renal cell carcinoma (RCC) from normal kidney tissue is critical for identifying positive surgical margins (PSMs) during partial and radical nephrectomy, which remains the primary intervention for localized RCC. Techniques that detect PSM with higher accuracy and faster turnaround time than intraoperative frozen section (IFS) analysis can help decrease reoperation rates, relieve patient anxiety and costs, and potentially improve patient outcomes. MATERIALS AND METHODS: Here, we extended our combined desorption electrospray ionization mass spectrometry imaging (DESI-MSI) and machine learning methodology to identify metabolite and lipid species from tissue surfaces that can distinguish normal tissues from clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC) tissues. RESULTS: From 24 normal and 40 renal cancer (23 ccRCC, 13 pRCC, and 4 chRCC) tissues, we developed a multinomial lasso classifier that selects 281 total analytes from over 27,000 detected molecular species that distinguishes all histological subtypes of RCC from normal kidney tissues with 84.5% accuracy. On the basis of independent test data reflecting distinct patient populations, the classifier achieves 85.4% and 91.2% accuracy on a Stanford test set (20 normal and 28 RCC) and a Baylor-UT Austin test set (16 normal and 41 RCC), respectively. The majority of the model's selected features show consistent trends across data sets affirming its stable performance, where the suppression of arachidonic acid metabolism is identified as a shared molecular feature of ccRCC and pRCC. CONCLUSION: Together, these results indicate that signatures derived from DESI-MSI combined with machine learning may be used to rapidly determine surgical margin status with accuracies that meet or exceed those reported for IFS.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Humanos , Carcinoma de Células Renais/diagnóstico por imagem , Rim/diagnóstico por imagem , Rim/cirurgia , Rim/metabolismo , Neoplasias Renais/diagnóstico por imagem , Neoplasias Renais/cirurgia , Espectrometria de Massas , Aprendizado de Máquina
11.
Stat Sin ; 33(1): 259-279, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37102071

RESUMO

In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the feature-weighted elastic net ("fwelnet"), uses these "features of features" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error and usually gives an improvement in true positive rate or false positive rate for feature selection. We also apply this method to early prediction of preeclampsia, where fwelnet outperforms the lasso in terms of 10-fold cross-validated area under the curve (0.86 vs. 0.80). We also provide a connection between fwelnet and the group lasso and suggest how fwelnet might be used for multi-task learning.

12.
Cell Syst ; 14(3): 196-209.e6, 2023 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-36827986

RESUMO

Maintaining persistent migration in complex environments is critical for neutrophils to reach infection sites. Neutrophils avoid getting trapped, even when obstacles split their front into multiple leading edges. How they re-establish polarity to move productively while incorporating receptor inputs under such conditions remains unclear. Here, we challenge chemotaxing HL60 neutrophil-like cells with symmetric bifurcating microfluidic channels to probe cell-intrinsic processes during the resolution of competing fronts. Using supervised statistical learning, we demonstrate that cells commit to one leading edge late in the process, rather than amplifying structural asymmetries or early fluctuations. Using optogenetic tools, we show that receptor inputs only bias the decision similarly late, once mechanical stretching begins to weaken each front. Finally, a retracting edge commits to retraction, with ROCK limiting sensitivity to receptor inputs until the retraction completes. Collectively, our results suggest that cell edges locally adopt highly stable protrusion/retraction programs that are modulated by mechanical feedback.


Assuntos
Proteínas de Transporte , Neutrófilos , Neutrófilos/fisiologia , Movimento Celular/fisiologia
13.
Sci Adv ; 9(3): eadd1166, 2023 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-36662860

RESUMO

Although literature suggests that resistance to TNF inhibitor (TNFi) therapy in patients with ulcerative colitis (UC) is partially linked to immune cell populations in the inflamed region, there is still substantial uncertainty underlying the relevant spatial context. Here, we used the highly multiplexed immunofluorescence imaging technology CODEX to create a publicly browsable tissue atlas of inflammation in 42 tissue regions from 29 patients with UC and 5 healthy individuals. We analyzed 52 biomarkers on 1,710,973 spatially resolved single cells to determine cell types, cell-cell contacts, and cellular neighborhoods. We observed that cellular functional states are associated with cellular neighborhoods. We further observed that a subset of inflammatory cell types and cellular neighborhoods are present in patients with UC with TNFi treatment, potentially indicating resistant niches. Last, we explored applying convolutional neural networks (CNNs) to our dataset with respect to patient clinical variables. We note concerns and offer guidelines for reporting CNN-based predictions in similar datasets.


Assuntos
Colite Ulcerativa , Humanos , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/complicações , Inibidores do Fator de Necrose Tumoral/uso terapêutico , Inflamação/complicações , Biomarcadores
14.
Nat Commun ; 13(1): 6646, 2022 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-36333296

RESUMO

While food allergy oral immunotherapy (OIT) can provide safe and effective desensitization (DS), the immune mechanisms underlying development of sustained unresponsiveness (SU) following a period of avoidance are largely unknown. Here, we compare high dimensional phenotypes of innate and adaptive immune cell subsets of participants in a previously reported, phase 2 randomized, controlled, peanut OIT trial who achieved SU vs. DS (no vs. with allergic reactions upon food challenge after a withdrawal period; n = 21 vs. 30 respectively among total 120 intent-to-treat participants). Lower frequencies of naïve CD8+ T cells and terminally differentiated CD57+CD8+ T cell subsets at baseline (pre-OIT) are associated with SU. Frequency of naïve CD8+ T cells shows a significant positive correlation with peanut-specific and Ara h 2-specific IgE levels at baseline. Higher frequencies of IL-4+ and IFNγ+ CD4+ T cells post-OIT are negatively correlated with SU. Our findings provide evidence that an immune signature consisting of certain CD8+ T cell subset frequencies is potentially predictive of SU following OIT.


Assuntos
Hipersensibilidade a Amendoim , Hipersensibilidade a Amendoim/terapia , Dessensibilização Imunológica/métodos , Imunoglobulina E , Linfócitos T CD8-Positivos , Estudos de Viabilidade , Administração Oral , Arachis , Alérgenos , Fatores Imunológicos , Diferenciação Celular
15.
Nat Med ; 28(9): 1860-1871, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36097223

RESUMO

Approximately 60% of patients with large B cell lymphoma treated with chimeric antigen receptor (CAR) T cell therapies targeting CD19 experience disease progression, and neurotoxicity remains a challenge. Biomarkers associated with resistance and toxicity are limited. In this study, single-cell proteomic profiling of circulating CAR T cells in 32 patients treated with CD19-CAR identified that CD4+Helios+ CAR T cells on day 7 after infusion are associated with progressive disease and less severe neurotoxicity. Deep profiling demonstrated that this population is non-clonal and manifests hallmark features of T regulatory (TReg) cells. Validation cohort analysis upheld the link between higher CAR TReg cells with clinical progression and less severe neurotoxicity. A model combining expansion of this subset with lactate dehydrogenase levels, as a surrogate for tumor burden, was superior for predicting durable clinical response compared to models relying on each feature alone. These data credential CAR TReg cell expansion as a novel biomarker of response and toxicity after CAR T cell therapy and raise the prospect that this subset may regulate CAR T cell responses in humans.


Assuntos
Síndromes Neurotóxicas , Receptores de Antígenos Quiméricos , Antígenos CD19 , Humanos , Imunoterapia Adotiva/efeitos adversos , Imunoterapia Adotiva/métodos , Lactato Desidrogenases , Síndromes Neurotóxicas/etiologia , Proteômica , Receptores de Antígenos de Linfócitos T
16.
Proc Natl Acad Sci U S A ; 119(38): e2202113119, 2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36095183

RESUMO

We propose a method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data, such as genomics, proteomics, and radiomics, are measured on a common set of samples. "Cooperative learning" combines the usual squared-error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g., lasso, random forests, boosting, or neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and real multiomics examples of labor-onset prediction. By leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.


Assuntos
Genômica , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Genômica/métodos
17.
Ann Appl Stat ; 16(3): 1891-1918, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36091495

RESUMO

In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that underlie the data. Our proposal is motivated by the UK Biobank population-based cohort study, where we are faced with large-scale, ultrahigh-dimensional features, and have access to a large number of outcomes (phenotypes)-lifestyle measures, biomarkers, and disease outcomes. We are hence led to fit sparse reduced-rank regression models, using computational strategies that allow us to scale to problems of this size. We use a scheme that alternates between solving the sparse regression problem and solving the reduced rank decomposition. For the sparse regression component we propose a scalable iterative algorithm based on adaptive screening that leverages the sparsity assumption and enables us to focus on solving much smaller subproblems. The full solution is reconstructed and tested via an optimality condition to make sure it is a valid solution for the original problem. We further extend the method to cope with practical issues, such as the inclusion of confounding variables and imputation of missing values among the phenotypes. Experiments on both synthetic data and the UK Biobank data demonstrate the effectiveness of the method and the algorithm. We present multiSnpnet package, available at http://github.com/junyangq/multiSnpnet that works on top of PLINK2 files, which we anticipate to be a valuable tool for generating polygenic risk scores from human genetic studies.

18.
J R Stat Soc Series B Stat Methodol ; 84(2): 524-546, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35910400

RESUMO

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers x as often as possible. BCOPS returns no prediction (corresponding to C(x) equal to the empty set) if it infers x to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

19.
PLoS Genet ; 18(3): e1010105, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35324888

RESUMO

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Bancos de Espécimes Biológicos , Predisposição Genética para Doença , Humanos , Herança Multifatorial/genética , Fenótipo , Fatores de Risco , Reino Unido
20.
Biostatistics ; 23(2): 522-540, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-32989444

RESUMO

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.


Assuntos
Algoritmos , Bancos de Espécimes Biológicos , Humanos , Funções Verossimilhança , Modelos de Riscos Proporcionais , Reino Unido
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA